Input/Output

Often, it is important to import information from a variety of sources and output the result. A few ways of creating and saving files are demonstrated.

By the end of this file you should have seen simple examples of:

  1. Printing string output to the screen
  2. Reading and writing string output to/from text files
  3. Reading and writing string output to/from csv files
  4. Reading and writing string output to/from binary files
  5. Reading and writing string output to/from matlab files

Further reading:
http://docs.h5py.org/en/latest/index.html


In [1]:
# Python Imports:
import numpy as np
import scipy.io as sio
%cd datafiles
!ls


C:\BACKUP_AWAYFROMONEDRIVE\Blake Files\Learning\IntroScientificPythonWithJupyter\datafiles
01-data_write.hdf5
01-simpledata.csv
01-simpledata_write.bin
01-simpledata_write.csv
01-simplemat.mat
01-simplemat_write.mat
01-simpletext.txt
01-simpletext_write.txt
pandas_df1.csv
pandas_df1.h5
presentation.mplstyle

From standard input/keyboard:

The import of simple text files can be performed directly in python via:


In [2]:
kb_contents = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit.'
print(kb_contents)


Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Text (ascii) files:

The import of simple text files can be performed directly in python by creating a file object and operating on that object:


In [3]:
# Read line by line:
file_obj = open('01-simpletext.txt','r')
for line in file_obj:
    print(line)
file_obj.close()


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam leo purus, interdum sed interdum quis, tincidunt ac nibh. 



Maecenas a purus massa. Nunc a augue augue. Donec in felis commodo lectus convallis elementum sed vitae ipsum.

In [4]:
# Use the read method:
file_obj = open('01-simpletext.txt','r')
file_contents = file_obj.read()
file_obj.close()
print(file_contents)


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam leo purus, interdum sed interdum quis, tincidunt ac nibh. 

Maecenas a purus massa. Nunc a augue augue. Donec in felis commodo lectus convallis elementum sed vitae ipsum.

In [5]:
# Python 'with' statement automatically takes care of the close for us:
with open('01-simpletext.txt','r') as file_obj:
    print(file_obj.read())


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam leo purus, interdum sed interdum quis, tincidunt ac nibh. 

Maecenas a purus massa. Nunc a augue augue. Donec in felis commodo lectus convallis elementum sed vitae ipsum.

In [6]:
# Write to ascii files:
file_obj = open('01-simpletext_write.txt','w')
file_obj.write(file_contents)
file_obj.close()

# Or, alternatively:
with open('01-simpletext_write.txt','w') as file_obj:
    file_obj.write(file_contents)

# Check that our written output is good:
with open('01-simpletext_write.txt','r') as file_obj:
    print(file_obj.read())


Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam leo purus, interdum sed interdum quis, tincidunt ac nibh. 

Maecenas a purus massa. Nunc a augue augue. Donec in felis commodo lectus convallis elementum sed vitae ipsum.

Comma Separated Values (.csv files):

Here, we import data separated by a particular delimiter, as in tsv or csv files:


In [7]:
# Creating a python list:
with open('01-simpledata.csv','r') as file_obj:
    file_contents = file_obj.read().split(',')
    
print(file_contents)


['1', '0.012', '2', '0.024\n2', '0.014', '4', '0.056\n3', '0.016', '8', '0.128\n4', '0.018', '16', '0.288\n5', '0.020', '32', '0.640\n6', '0.022', '64', '1.408\n7', '0.024', '128', '3.072\n8', '0.026', '256', '6.656\n9', '0.028', '512', '14.336\n10', '0.030', '1024', '30.720']

In [8]:
# Use numpy to read an array from a file
file_contents = np.loadtxt(open('01-simpledata.csv'), delimiter=",")
file_contents = file_contents.astype('float')
print(file_contents)


[[  1.00000000e+00   1.20000000e-02   2.00000000e+00   2.40000000e-02]
 [  2.00000000e+00   1.40000000e-02   4.00000000e+00   5.60000000e-02]
 [  3.00000000e+00   1.60000000e-02   8.00000000e+00   1.28000000e-01]
 [  4.00000000e+00   1.80000000e-02   1.60000000e+01   2.88000000e-01]
 [  5.00000000e+00   2.00000000e-02   3.20000000e+01   6.40000000e-01]
 [  6.00000000e+00   2.20000000e-02   6.40000000e+01   1.40800000e+00]
 [  7.00000000e+00   2.40000000e-02   1.28000000e+02   3.07200000e+00]
 [  8.00000000e+00   2.60000000e-02   2.56000000e+02   6.65600000e+00]
 [  9.00000000e+00   2.80000000e-02   5.12000000e+02   1.43360000e+01]
 [  1.00000000e+01   3.00000000e-02   1.02400000e+03   3.07200000e+01]]

In [9]:
# Save output of numpy array to csv file
file_contents_write = file_contents*2 #Double to differentiate read vs write data

np.savetxt('01-simpledata_write.csv',file_contents_write, '%0.3f', delimiter=",") 
# %0.3f specifies scientific notation with 3 decimal places
file_contents = np.loadtxt(open('01-simpledata_write.csv'), delimiter=",")
print(file_contents)


[[  2.00000000e+00   2.40000000e-02   4.00000000e+00   4.80000000e-02]
 [  4.00000000e+00   2.80000000e-02   8.00000000e+00   1.12000000e-01]
 [  6.00000000e+00   3.20000000e-02   1.60000000e+01   2.56000000e-01]
 [  8.00000000e+00   3.60000000e-02   3.20000000e+01   5.76000000e-01]
 [  1.00000000e+01   4.00000000e-02   6.40000000e+01   1.28000000e+00]
 [  1.20000000e+01   4.40000000e-02   1.28000000e+02   2.81600000e+00]
 [  1.40000000e+01   4.80000000e-02   2.56000000e+02   6.14400000e+00]
 [  1.60000000e+01   5.20000000e-02   5.12000000e+02   1.33120000e+01]
 [  1.80000000e+01   5.60000000e-02   1.02400000e+03   2.86720000e+01]
 [  2.00000000e+01   6.00000000e-02   2.04800000e+03   6.14400000e+01]]

Binary Files:

Binary files store the same information as text or csv, but do so directly in bytes, rather than using ascii to encode. They have the advantage of being faster to read and smaller in size, but are not readily readable by a typical text editor (notepad, vim, sublime, etc).

Note: be careful to avoid numpy.fromfile and numpy.tofile as they are not platform independent!


In [10]:
# Read in the csv from the previous step:
file_contents = np.loadtxt(open('01-simpledata_write.csv'), delimiter=",")
print(file_contents)


[[  2.00000000e+00   2.40000000e-02   4.00000000e+00   4.80000000e-02]
 [  4.00000000e+00   2.80000000e-02   8.00000000e+00   1.12000000e-01]
 [  6.00000000e+00   3.20000000e-02   1.60000000e+01   2.56000000e-01]
 [  8.00000000e+00   3.60000000e-02   3.20000000e+01   5.76000000e-01]
 [  1.00000000e+01   4.00000000e-02   6.40000000e+01   1.28000000e+00]
 [  1.20000000e+01   4.40000000e-02   1.28000000e+02   2.81600000e+00]
 [  1.40000000e+01   4.80000000e-02   2.56000000e+02   6.14400000e+00]
 [  1.60000000e+01   5.20000000e-02   5.12000000e+02   1.33120000e+01]
 [  1.80000000e+01   5.60000000e-02   1.02400000e+03   2.86720000e+01]
 [  2.00000000e+01   6.00000000e-02   2.04800000e+03   6.14400000e+01]]

In [11]:
# Save as a binary file:
np.savetxt('01-simpledata_write.bin', file_contents_write*2) # Note the lack of demiliter
file_contents = np.loadtxt('01-simpledata_write.bin')

# The following is not recommended, as it is platform dependent:
#np.ndarray.tofile(file_contents_write, '01-simpledata_write.bin')
#file_contents = np.fromfile('01-simpledata_write.bin')

print(file_contents)


[[  4.00000000e+00   4.80000000e-02   8.00000000e+00   9.60000000e-02]
 [  8.00000000e+00   5.60000000e-02   1.60000000e+01   2.24000000e-01]
 [  1.20000000e+01   6.40000000e-02   3.20000000e+01   5.12000000e-01]
 [  1.60000000e+01   7.20000000e-02   6.40000000e+01   1.15200000e+00]
 [  2.00000000e+01   8.00000000e-02   1.28000000e+02   2.56000000e+00]
 [  2.40000000e+01   8.80000000e-02   2.56000000e+02   5.63200000e+00]
 [  2.80000000e+01   9.60000000e-02   5.12000000e+02   1.22880000e+01]
 [  3.20000000e+01   1.04000000e-01   1.02400000e+03   2.66240000e+01]
 [  3.60000000e+01   1.12000000e-01   2.04800000e+03   5.73440000e+01]
 [  4.00000000e+01   1.20000000e-01   4.09600000e+03   1.22880000e+02]]

Matlab (.mat) files:

Generating matlab variables via:

testvar = magic(9)

save('01-simplemat.mat','testvar')

These can then be loaded via scipy.io (imported as sio here):


In [12]:
# Use scipy to read in .mat files:
mat_contents= sio.loadmat('01-simplemat.mat')

testvar = mat_contents['testvar']
print(testvar)


[[47 58 69 80  1 12 23 34 45]
 [57 68 79  9 11 22 33 44 46]
 [67 78  8 10 21 32 43 54 56]
 [77  7 18 20 31 42 53 55 66]
 [ 6 17 19 30 41 52 63 65 76]
 [16 27 29 40 51 62 64 75  5]
 [26 28 39 50 61 72 74  4 15]
 [36 38 49 60 71 73  3 14 25]
 [37 48 59 70 81  2 13 24 35]]

In [13]:
# Use scipy to write .mat files:
testvar_write = testvar*2 # Double to make read data different from write data

sio.savemat('01-simplemat_write.mat' ,{'testvar_write':testvar_write})

mat_contents = sio.loadmat('01-simplemat_write.mat')
testvar = mat_contents['testvar_write']
print(testvar_write)


[[ 94 116 138 160   2  24  46  68  90]
 [114 136 158  18  22  44  66  88  92]
 [134 156  16  20  42  64  86 108 112]
 [154  14  36  40  62  84 106 110 132]
 [ 12  34  38  60  82 104 126 130 152]
 [ 32  54  58  80 102 124 128 150  10]
 [ 52  56  78 100 122 144 148   8  30]
 [ 72  76  98 120 142 146   6  28  50]
 [ 74  96 118 140 162   4  26  48  70]]

HDF5 files

HDF5 or Hierarchical Data Format provides a file format that has a much greater amount of flexibility at the cost of a bit more complexity. HDF5 is ideal when there would otherwise have been many small files. There are two main objects:

  • Groups: folder-like containers that work like Python dictionaries
  • Datasets: NumPy-like arrays

In [14]:
import h5py

In [15]:
# Load csv data:
data_csv = np.loadtxt(open('01-simpledata_write.csv'), delimiter=",")

# Load mat data:
data_mat = sio.loadmat('01-simplemat_write.mat')['testvar_write']

# Load text data:
with open('01-simpletext.txt','r') as file_obj:
    data_txt = file_obj.read()

In [16]:
# Create a h5py file object:
with h5py.File("01-data_write.hdf5", "w") as file_obj:   
    # Use file_obj to create data sets
    
    # Create a dataset object and assign the values from data:
    dataset1 = file_obj.create_dataset("data", data = data_csv)

Check that the data has been written to the file by opening it:


In [17]:
with h5py.File("01-data_write.hdf5", 'r') as file_obj:
    print(file_obj["data"].name)
    print(file_obj["data"].value)


/data
[[  2.00000000e+00   2.40000000e-02   4.00000000e+00   4.80000000e-02]
 [  4.00000000e+00   2.80000000e-02   8.00000000e+00   1.12000000e-01]
 [  6.00000000e+00   3.20000000e-02   1.60000000e+01   2.56000000e-01]
 [  8.00000000e+00   3.60000000e-02   3.20000000e+01   5.76000000e-01]
 [  1.00000000e+01   4.00000000e-02   6.40000000e+01   1.28000000e+00]
 [  1.20000000e+01   4.40000000e-02   1.28000000e+02   2.81600000e+00]
 [  1.40000000e+01   4.80000000e-02   2.56000000e+02   6.14400000e+00]
 [  1.60000000e+01   5.20000000e-02   5.12000000e+02   1.33120000e+01]
 [  1.80000000e+01   5.60000000e-02   1.02400000e+03   2.86720000e+01]
 [  2.00000000e+01   6.00000000e-02   2.04800000e+03   6.14400000e+01]]

The "Hierarchical" part of the HDF5 file format provides groups, which act like Python dictionaries or 'folders' for the various Datasets.


In [18]:
# Open the same h5py file object:
with h5py.File("01-data_write.hdf5", "w") as file_obj:

    # Create a group object, and create datasets underneath it:
    grp_nums = file_obj.create_group("Numbers")
    dataset_csv = grp_nums.create_dataset("CSV", data=data_csv)
    dataset_mat = grp_nums.create_dataset("MAT", data=data_mat)

    # Create a second group object, and create datasets underneath it:
    grp_txt = file_obj.create_group("Text")
    txt_hf5 = np.asarray(data_txt, dtype="S") # Convert to NumPy S dtype:
    dataset_txt = grp_txt.create_dataset("lorem", data=txt_hf5)

After saving this data, check the file structure:


In [19]:
def print_attrs(name, obj): # Function that prints the name and object
    print(name)
    print(obj)
        
with h5py.File("01-data_write.hdf5", 'r') as file_obj:
    file_obj.visititems(print_attrs) # Use .visititems to get info


Numbers
<HDF5 group "/Numbers" (2 members)>
Numbers/CSV
<HDF5 dataset "CSV": shape (10, 4), type "<f8">
Numbers/MAT
<HDF5 dataset "MAT": shape (9, 9), type "|u1">
Text
<HDF5 group "/Text" (1 members)>
Text/lorem
<HDF5 dataset "lorem": shape (), type "|S231">

In [20]:
with h5py.File("01-data_write.hdf5", 'r') as file_obj:
    print(file_obj["/Numbers/CSV"].name)
    print(file_obj["/Numbers/CSV"].value)
    
    print(file_obj["/Numbers/MAT"].name)
    print(file_obj["/Numbers/MAT"].value)
    
    print(file_obj["/Text/lorem"].name)
    print(file_obj["/Text/lorem"].value)


/Numbers/CSV
[[  2.00000000e+00   2.40000000e-02   4.00000000e+00   4.80000000e-02]
 [  4.00000000e+00   2.80000000e-02   8.00000000e+00   1.12000000e-01]
 [  6.00000000e+00   3.20000000e-02   1.60000000e+01   2.56000000e-01]
 [  8.00000000e+00   3.60000000e-02   3.20000000e+01   5.76000000e-01]
 [  1.00000000e+01   4.00000000e-02   6.40000000e+01   1.28000000e+00]
 [  1.20000000e+01   4.40000000e-02   1.28000000e+02   2.81600000e+00]
 [  1.40000000e+01   4.80000000e-02   2.56000000e+02   6.14400000e+00]
 [  1.60000000e+01   5.20000000e-02   5.12000000e+02   1.33120000e+01]
 [  1.80000000e+01   5.60000000e-02   1.02400000e+03   2.86720000e+01]
 [  2.00000000e+01   6.00000000e-02   2.04800000e+03   6.14400000e+01]]
/Numbers/MAT
[[ 94 116 138 160   2  24  46  68  90]
 [114 136 158  18  22  44  66  88  92]
 [134 156  16  20  42  64  86 108 112]
 [154  14  36  40  62  84 106 110 132]
 [ 12  34  38  60  82 104 126 130 152]
 [ 32  54  58  80 102 124 128 150  10]
 [ 52  56  78 100 122 144 148   8  30]
 [ 72  76  98 120 142 146   6  28  50]
 [ 74  96 118 140 162   4  26  48  70]]
/Text/lorem
b'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam leo purus, interdum sed interdum quis, tincidunt ac nibh. \n\nMaecenas a purus massa. Nunc a augue augue. Donec in felis commodo lectus convallis elementum sed vitae ipsum.'

For coinvenience, it's possible to print all of the information using .visititems:


In [21]:
def print_attrs(name, obj):
    print(name)
    if isinstance(obj, h5py.Group):
        print(obj)
    if isinstance(obj, h5py.Dataset):
        print(obj.value)
        
with h5py.File("01-data_write.hdf5", 'r') as file_obj:
    file_obj.visititems(print_attrs)


Numbers
<HDF5 group "/Numbers" (2 members)>
Numbers/CSV
[[  2.00000000e+00   2.40000000e-02   4.00000000e+00   4.80000000e-02]
 [  4.00000000e+00   2.80000000e-02   8.00000000e+00   1.12000000e-01]
 [  6.00000000e+00   3.20000000e-02   1.60000000e+01   2.56000000e-01]
 [  8.00000000e+00   3.60000000e-02   3.20000000e+01   5.76000000e-01]
 [  1.00000000e+01   4.00000000e-02   6.40000000e+01   1.28000000e+00]
 [  1.20000000e+01   4.40000000e-02   1.28000000e+02   2.81600000e+00]
 [  1.40000000e+01   4.80000000e-02   2.56000000e+02   6.14400000e+00]
 [  1.60000000e+01   5.20000000e-02   5.12000000e+02   1.33120000e+01]
 [  1.80000000e+01   5.60000000e-02   1.02400000e+03   2.86720000e+01]
 [  2.00000000e+01   6.00000000e-02   2.04800000e+03   6.14400000e+01]]
Numbers/MAT
[[ 94 116 138 160   2  24  46  68  90]
 [114 136 158  18  22  44  66  88  92]
 [134 156  16  20  42  64  86 108 112]
 [154  14  36  40  62  84 106 110 132]
 [ 12  34  38  60  82 104 126 130 152]
 [ 32  54  58  80 102 124 128 150  10]
 [ 52  56  78 100 122 144 148   8  30]
 [ 72  76  98 120 142 146   6  28  50]
 [ 74  96 118 140 162   4  26  48  70]]
Text
<HDF5 group "/Text" (1 members)>
Text/lorem
b'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nam leo purus, interdum sed interdum quis, tincidunt ac nibh. \n\nMaecenas a purus massa. Nunc a augue augue. Donec in felis commodo lectus convallis elementum sed vitae ipsum.'

h5py also allows storing of metadata relating to data - check the h5py documentation for more info: http://docs.h5py.org/en/latest/index.html